Skip to content

Conversation

@TianHao324
Copy link
Contributor

@TianHao324 TianHao324 commented Nov 20, 2025

The CANN backend supports floating-point product calculations.

@noemotiovon noemotiovon added the Ascend NPU issues specific to Ascend NPUs label Nov 20, 2025
@TianHao324 TianHao324 changed the title cann supports out_prod operator for F32 and F16 CANN: supports out_prod operator for F32 and F16 Nov 20, 2025
@TianHao324
Copy link
Contributor Author

test result
image

Copy link
Collaborator

@noemotiovon noemotiovon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a minor issue.


const int64_t i12 = i2;
const int64_t i13 = i3;
aclTensor *accumulator = ggml_cann_create_tensor(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of ggml_cann_create_tensor should be acl_tensor_ptr, not aclTensor*.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edited

*/
void ggml_cann_out_prod(ggml_backend_cann_context & ctx, ggml_tensor * dst);

void ggml_cann_out_prod_fp(ggml_backend_cann_context & ctx, ggml_tensor * dst);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edited

#include <aclnnop/aclnn_index_select.h>
#include <aclnnop/aclnn_clamp.h>
#include <aclnnop/aclnn_threshold.h>
#include <aclnnop/aclnn_ger.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use
find ggml/src/ggml-cann -iname ".cpp" -o -iname ".h" | xargs clang-format -i
to format code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edited

#include <aclnnop/aclnn_index_select.h>
#include <aclnnop/aclnn_clamp.h>
#include <aclnnop/aclnn_threshold.h>
#include <aclnnop/aclnn_ger.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use
find ggml/src/ggml-cann -iname ".cpp" -o -iname ".h" | xargs clang-format -i
to format code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edited

dst->nb,
2);

GGML_CANN_CALL_ACLNN_OP(ctx, InplaceZero, accumulator);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, InplaceZero is being called on each iteration of the for loop. I believe we can just call it once on dst before the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edited

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 20, 2025
@TianHao324 TianHao324 force-pushed the out_prod branch 3 times, most recently from 815e770 to 5d9578a Compare November 20, 2025 11:45
@noemotiovon
Copy link
Collaborator

Thank you for your contribution! :)

Copy link
Collaborator

@hipudding hipudding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current computation method is not optimal. Computing the outer product is essentially two vectors of dimension 1, which can be broadcast and multiplied element-wise. Here, the broadcast can be implemented by modifying the view (similar to how it’s done in operators like Add), followed by element-wise multiplication.

Ignore my previous comment. aclnnGer is actually the outer product; the CANN documentation description is incorrect.

ggml_type_size(dst->type), dst->ne, dst->nb, 2);

// The outer product needs to be accumulated in this dimension.
for (int64_t i1 = 0; i1 < ne11; i1++) {
Copy link
Collaborator

@hipudding hipudding Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three nested loops here, which will result in very poor performance. Optimization should be done once an appropriate operator is available.

acl_tensor_ptr acl_out = ggml_cann_create_tensor(output_buffer, ggml_cann_type_mapping(dst->type),
ggml_type_size(dst->type), dst->ne, dst->nb, 2);

GGML_CANN_CALL_ACLNN_OP(ctx, Ger, acl_input.get(), acl_weight.get(), acl_out.get());
Copy link
Collaborator

@hipudding hipudding Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This operator is intended for computing the outer product, but it uses the inner product here, which is unsuitable.

Ignore my previous comment. aclnnGer is actually the outer product; the CANN documentation description is incorrect.

@hipudding hipudding merged commit 064c90d into ggml-org:master Nov 25, 2025
77 checks passed
@hipudding
Copy link
Collaborator

Is 310p support aclnnGer? If not, please change support op function.

@TianHao324
Copy link
Contributor Author

Is 310p support aclnnGer? If not, please change support op function.

I will go check this issue, and if not, I will modify it in the next PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants